473,397 Members | 1,969 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

could you please help me with regex

103 100+
I want to find all repeating pattersn with start with 1-2degits, contains some text, strange scharacters and numbers and finishes with 4 digits.

For example,

Expand|Select|Wrap|Line Numbers
  1. ref=""" 1. Lieber, C. M. The incredible shrinking circuit. Sci. Am. 285, 58±64 (2001).
  2. 2. Cui, Y. & Lieber, C. M. Functional nanoscale electronic devices assembled using silicon nanowire
  3. building blocks. Science 291, 851±853 (2001).
  4. 3. Wang, J. F., Gudiksen, M. S., Duan, X. F., Cui, Y. & Lieber, C. M. Highly polarized photolumi-
  5. nescence and photodetection from single indium phosphide nanowires. Science 293, 1455±1457
  6. (2001)."""
  7.  
  8. quer="\d{1,2}\.\s+.+\(\d{4}\)\."
  9. ansre=re.findall(quer,ref,re.DOTALL)
  10. print(len(ansre))
  11.  
It fineds only one pattern.

If i use
Expand|Select|Wrap|Line Numbers
  1. quer="\d{1,2}\.\s+"
i find 23 patterns, but of course this captures only the beginning of the pattern, i.e. 1-2 digits.

If i add .+, i am finding only one pattern:
Expand|Select|Wrap|Line Numbers
  1. quer="\d{1,2}\.\s+\.+"
I do not know how to mark that any character can be between 1-2 and 4 numbers in the patters. If i use .+ to mark this, when regex captures full text, it does not stop at 4 digits.

Thank you!
Oct 4 '14 #1
5 1395
Rabbit
12,516 Expert Mod 8TB
You need to define what you mean by "contains some text" and "strange scharacters". Those terms are too vague and I have no idea what you mean by them.

But if your goal is to break out those references. Then you can use this
Expand|Select|Wrap|Line Numbers
  1. \d{1,2}\.[^(]+\(\d{4}\)\.
This is assuming that your test data is representative and there are never any parentheses except for the ones around the year at the end of the reference.
Oct 4 '14 #2
gintare
103 100+
The problem is that there can be parenthesis in the text.
I do not know how to define "any character" between
Expand|Select|Wrap|Line Numbers
  1. \d+\.
and
Expand|Select|Wrap|Line Numbers
  1. \d+\)\.
. If i use
Expand|Select|Wrap|Line Numbers
  1. .+
it captures all text, but i need to capture as many lines as there are starting with
Expand|Select|Wrap|Line Numbers
  1. \d+\.
and ending with
Expand|Select|Wrap|Line Numbers
  1. \d+\)\.
Oct 4 '14 #3
gintare
103 100+
I have tried this:
Expand|Select|Wrap|Line Numbers
  1. \d+\.\s+[\sa-zA-Z0-9±&,:;.)(-]+
but it captures only 3 lines of 21 (Number 3, 11, 15 from the text below). DO not know why. Full text to be searched looks like this:
Expand|Select|Wrap|Line Numbers
  1. ref="""
  2. 1. Haraguchi, K., Katsuyama, T., Hiruma, K. & Ogawa, K. GaAs p-n junction formed in quantum wire
  3. crystals. Appl. Phys. Lett. 60, 745 747 (1992).
  4. 2. Björk, M. T. et al. Nanowire resonant tunneling diodes. App. Phys.Lett. 81, 4458±4460 (2002).
  5. 3. Thelander, C. et al. Single electron transistors in heterostructure nanowires. Appl. Phys. Lett. 83,
  6. 2052±2054 (2003).
  7. 4. Wagner,R. S.in Whisker Technology (ed. Levitt, A. P.) 47±119 (Wiley. New York, 1970)
  8. 5. Hiruma, K. et al. Growth and optical properties of nanometer scale GaAs and InAs whiskers. J. Appl.
  9. Phys. 77, 447±462 (1995).
  10. 6. Duan, X. & Lieber, C. M. General synthesis of compound semiconductor nanowires. Adv. Mater. 12,
  11. 298±302 (2000).
  12. 7. Duan, X. & Lieber, C. M. Laser assisted catalytic growth of single crystal GaN nanowires. J. Am. Chem.
  13. Soc. 122, 188±189 (2000).
  14. 8. Ohlsson, B. J. et al. Size , shape , and position controlled GaAs nano whiskers. Appl. Phys. Lett. 79,
  15. 3335±3337 (2001).
  16. 9. Kamins, T. I., Stanley Williams, R., Basile, D. P., Hesjedal, T. & Harris, J. S. Ti catalyzed Si nanowires by
  17. chemical vapor deposition: Microscopy and growth mechanisms. J. Appl. Phys. 89, 1008±1016 (2001).
  18. 10. Ohlsson, B. J. et al. Growth and characterization of GaAs and InAs nano whiskers and InAs/GaAs
  19. heterostructures. Physica E 13, 1126±1130 (2002).
  20. 11. Buffat, P. & Borel, J. P. Size effect on the melting temperature of gold particles. Phys. Rev. A 13,
  21. 2287±2298 (1976).
  22. 12. Björk, M. T. et al. One dimensional steeplechase for electrons realized. Nano Lett. 2, 87±89 (2002).
  23. 13. Gudiksen, M. S., Lauhon, L. J., Wang, J., Smith, D. S. & Lieber, C. M. Growth of nanowires superlattice
  24. structures for nanoscale photonics and electronics. Nature 415, 617±620 (2002).
  25. 14. Wu, Y., Fan, R. & Yang, P. Block by block growth of single crystalline Si/SiGe superlattice nanowires.
  26. Nano Lett. 2, 83±86 (2002).
  27. 15. Baker, R. T. K. Catalytic growth of carbon filaments. Carbon 27, 315±323 (1989).
  28. 16. Helveg, S. et al. Atomic scale imaging of carbon nanofibre growth. Nature 427, 426±429 (2004).
  29. 17. Massalski, T. B. (ed.) Binary Alloy Phase Diagrams 2nd edn Vol. 1 369±371 (ASM International,
  30. Materials Park, Ohio, 1990).
  31. 18. Gupta, R. P., Khokle, W. S., Wuerfl, J. & Hartnagel, H. L. Diffusion of gallium in thin gold films on
  32. GaAs. Thin Solid Films. 151, L121±L125 (1987).
  33. 19. Massalski, T. B. (ed.) Binary Alloy Phase Diagrams 1st edn Vol. 1 191±192 (ASM International, Metals
  34. Park, Ohio, 1986).
  35. 20. Magnusson, M. H., Deppert, K., Malm, J. O., Bovin, J. O. & Samuelson, L. Gold nanoparticles:production, reshaping, and thermal charging. J. Nanoparticle Res. 1, 243±251 (1999).
  36. 21. Bakkers, E. & Verheijen, M. A. Synthesis of InP nanotubes. J. Am. Chem. Soc. 125, 3440±3441 (2003). """
Oct 4 '14 #4
gintare
103 100+
I find out why regex captured lines 3,11,15. Because they contains /
If i use
Expand|Select|Wrap|Line Numbers
  1. \d+\.\s+[\sa-zA-Z0-9/±&,:;.)(öÖäÄåÅ-]*\d{4}\)\.
all text is captured, the same as if i would be using
Expand|Select|Wrap|Line Numbers
  1. \d+\..+\d{4}\)\.
SO, question remains , how to define any character between other two sets of characters? In my case i want to capture 21 lines.
Oct 4 '14 #5
Rabbit
12,516 Expert Mod 8TB
By default, regex uses greedy matching. That's why it doesn't stop after the first occurrence of a match. You need to tell it be be non-greedy.

You can do that by putting a ? after the qualifiers: * . + ?
Expand|Select|Wrap|Line Numbers
  1. \d{1,2}\..+?\(\d{4}\)\.
Oct 4 '14 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

16
by: Henri Schomäcker | last post by:
Hi folks, I am developing a apache2 so module in c++. At the moment, I'm trying to get it to compile with automake & friends, but don't get it to work. I tried to modify the example in the...
0
by: Stephen | last post by:
I was wondering if someone could please help me with an array I'd like to create in an asp.net page. I have to design an array which stores the values of addresses manually entered into textboxes...
0
by: JoseTA | last post by:
Anybody Could please help this... MDIParent Form: ------------------- Event ButtonClick(strKey As String) Private Sub tbrMain_ButtonClick(ByVal Button _ As MSComctlLib.Button) RaiseEvent...
5
by: Christopher Walsh | last post by:
Hello, I am having problems with one of the java scripts on my website and I was hoping that maybe someone could tell me what I'm doing wrong and how to fix my problem. The problem is that I...
4
by: Dave | last post by:
I am working on an access 2000 DB for some tree-growers that will be storing items in a heirarchy of locations. The items will obviously be stored at the lowest level in the heirarchy (in a row)...
5
by: Fritz Switzer | last post by:
I've got some strings I'd like to regex.split. Any ideas on what the format would be for these examples. I'm webscraping so I have no control on the inputs. A couple points: the POS can be...
1
by: albert_reade | last post by:
Hello I was wondering if someone could please help me understand what I need to do in order to get this project to work. I just need some hints or a push in the right direction to get this to work,...
2
by: sklett | last post by:
I have an Intel hex file I need to parse. I want to run a regex on each line to get the separate sections. the format is like this: :llaaaattcc where: : - starts the record ll - is the length...
2
by: tawright915 | last post by:
Ok so here is my regex (--.*\n|/\*(.|\n)*?\*/). It finds all comments just fine. However I want it to return to me all strings that are not commented out. Is there a way to exclude the comments...
6
raubana
by: raubana | last post by:
I wanna make a game called planetiod where you create planets and try not to blow them up, but i'm having a hard time with it. If you could, please tell me if you could help me with some modules or...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.